Feature extraction strategies in deep learning based acoustic event detection

نویسندگان

Miquel Espi

Masakiyo Fujimoto

Keisuke Kinoshita

Tomohiro Nakatani

چکیده

Non-speech acoustic events are significantly different between them, and usually require access to detail rich features. That is why directly modeling a real spectrogram can provide a significant advantage, instead of using predefined features that usually compress and downsample detail as typically done in speech recognition. This paper focuses on the importance of feature extraction for deep learning based acoustic event detection, and more specifically on exploiting local spectro-temporal features of sounds. We do this in two ways: (1) outside the model, using multiple resolution spectrogram simultaneously based on the fact that there is a time-frequency detail trade-off that depends on the resolution with which a spectrogram is computed (e.g. ‘steps’ would require a finer time resolution, while sounds that span many frequencies require finer frequency detail); and (2), with a model that implicitly exploits locality, convolutional neural networks, which are a state-of-the-art 2D feature extraction model. An experimental evaluation shows that the presented approaches outperform state-of-the-art deep learning baseline with a noticeable gain in the CNN case, and provides insights regarding CNN-based spectrogram characterization.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

Commercial quadcopters with many private, commercial, and public sector applications are a rapidly advancing technology. Currently, there is no guarantee to facilitate the safe operation of these devices in the community. Three different automatic commercial quadcopters identification methods are presented in this paper. Among these three techniques, two are based on deep neural networks in whi...

متن کامل

DNN Transfer Learning Based Non-Linear Feature Extraction for Acoustic Event Classification

Recent acoustic event classification research has focused on training suitable filters to represent acoustic events. However, due to limited availability of target event databases and linearity of conventional filters, there is still room for improving performance. By exploiting the non-linear modeling of deep neural networks (DNNs) and their ability to learn beyond pre-trained environments, th...

متن کامل

Learning multi-labeled bioacoustic samples with an unsupervised feature learning approach

Multi-label Bird Species Classification competition provides an excellent opportunity to analyze the effectiveness of acoustic processing and mutlilabel learning. We propose an unsupervised feature extraction and generation approach based on latest advances in deep neural network learning, which can be applied generically to acoustic data. With state-of-the-art approaches from multilabel learni...

متن کامل

Robust Features in Deep-Learning-Based Speech Recognition

Recent progress in deep learning has revolutionized speech recognition research, with Deep Neural Networks (DNNs) becoming the new state of the art for acoustic modeling. DNNs offer significantly lower speech recognition error rates compared to those provided by the previously used Gaussian Mixture Models (GMMs). Unfortunately, DNNs are data sensitive, and unseen data conditions can deteriorate...

متن کامل

Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis

In this paper, we investigate the effectiveness of speaker adaptation for various essential components in deep neural network based speech synthesis, including acoustic models, acoustic feature extraction, and post-filters. In general, a speaker adaptation technique, e.g., maximum likelihood linear regression (MLLR) for HMMs or learning hidden unit contributions (LHUC) for DNNs, is applied to a...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Feature extraction strategies in deep learning based acoustic event detection

نویسندگان

چکیده

منابع مشابه

Combining pattern recognition and deep-learning-based algorithms to automatically detect commercial quadcopters using audio signals (Research Article)

DNN Transfer Learning Based Non-Linear Feature Extraction for Acoustic Event Classification

Learning multi-labeled bioacoustic samples with an unsupervised feature learning approach

Robust Features in Deep-Learning-Based Speech Recognition

Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis

عنوان ژورنال:

اشتراک گذاری